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Motion- compensated television coders generate data at a nonuni- 
form rate, which is smoothed by a buffer for transmission over a 
channel of constant bit rate. Spatial subsampling is one of the 
methods used to prevent overflow of the buffer, which would otherwise 
occur for scenes with complex motion from frame to frame. In this 
paper we evaluate the effects of spatial subsampling on the perform- 
ance of motion- compensated coders. In particular, we find that, 
although the quality of motion estimation does degrade in the pres- 
ence of subsampling, the degradation is not substantial. Use of 2:1 
horizontal subsampling, for example, results in bit rates that are 50 
percent lower compared with no subsampling for motion- compen- 
sated coders. This percentage is approximately the same for the 
conditional- replenishment coders. Spatial subsampling generally re- 
sults in blurring of the picture. We describe a technique for adaptive 
interpolation that results in blurring of only the unpredictable areas 
of the picture. The subjective quality in the presence of subsampling 
is thus improved considerably. In conclusion, our techniques for 
subsampling in a motion- compensated coder reduce the bit rate 
approximately by the same factor as subsampling in a conditional- 
replenishment coder, but result in a much better picture quality. 

I. INTRODUCTION 

Television signals contain a significant amount of frame-to-frame 
redundancy. Interframe coders attempt to exploit this redundancy by 
(i) Segmenting each television frame into two parts, one part that 
is predictable from the previous data, and one part that is unpredict- 
able. 

(ii) Transmitting two types of information: (a) addresses specifying 
the location of the picture elements in the unpredictable area, and (b) 
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information (usually quantized prediction error) by which the inten- 
sities of the unpredictable area can be updated. 

(Hi) Matching the coder bit rate to the channel rate. Since the 
motion in a real television scene occurs randomly and in bursts, the 
amount of information about the unpredictable area will change as a 
function of time. It is transmitted over a constant bit-rate channel by, 
(a) storing it in a buffer prior to transmission to smooth out the 
transmitted information rate, and (b) using the buffer fullness to 
regulate the encoded bit rate by varying the amplitude, spatial, and 
temporal resolution of the television signal. Intensities of the picture 
elements (pels) in the unpredictable areas are transmitted by predic- 
tive coding. In conditional-replenishment coding, 1 " 4 quantized values 
of frame difference, element difference, and line difference (or a com- 
bination thereof) are transmitted. In motion-compensated coders, 5-9 
estimates of interframe translation of objects are obtained, and more 
efficient predictive coding is performed by taking differences of ele- 
ments from the previous frame that are appropriately translated. The 
translation is equal to the displacement of the object. In our previous 
papers 5-9 we described several methods of displacement estimation 
and locally adaptive prediction to reduce the bit rate of interframe 
coders. Displacement estimation methods were recursive, which min- 
imized the motion-compensated prediction error by a steepest-descent 
algorithm. 

As mentioned before, most interframe coders require a buffer to 
smooth the output of the coder for transmission over a channel. One 
method of reducing the size of the buffer or preventing buffer overflow 
is to control the resolution by spatial subsampling. Typically, if the 
buffer starts to fill rapidly, resolution is decreased; resolution is in- 
creased if the buffer is nearly empty. It is not known how to optimally 
control the resolution of the unpredictable area for a given channel-bit 
rate and buffer size. However, many excellent resolution-channel al- 
gorithms have been designed on a trial and error basis. In this paper, 
we are concerned with spatial subsampling in motion-compensated 
coders, for the purpose of 

(i) Investigating to what extent spatial subsampling adversely 
affects the recursive-displacement estimation algorithm 

(ii) Modifying the displacement estimator to increase its efficiency 
in the presence of subsampling 

(Hi) Evaluating a new algorithm for adaptive interpolation that 
blurs only the unpredictable area (as compared with the "moving 
area") in a conditional-replenishment coder 

(iu) Presenting simulation results on synthetic scenes with known 
displacement, and real scenes containing complex motion. 

Our simulations are restricted to horizontal subsampling by factors 
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of 2:1 and 4:1. Simulations indicate that spatial subsampling does 
degrade our displacement estimation. However, the degradation is not 
very serious. It is known (and confirmed by our simulations) that 2:1 
and 4:1 subsampling reduces the bit rates of conditional-replenishment 
coders approximately by a factor of two and four, respectively. We 
found that in the case of motion-compensated coders, 2:1 and 4:1 
subsampling reduces the bit rates of a motion-compensated coder by 
similar factors. In most conditional-replenishment coders, subsampling 
blurs the "moving areas" (i.e., pels for which amplitude of the frame 
difference is above a certain threshold); our adaptive interpolation 
algorithm blurs only the unpredictable areas. Thus, improvement in 
prediction reduces the blurred areas, thereby improving the picture 
quality. 

II. ALGORITHM 

In this section, we describe the modifications to our displacement 
estimation algorithm. It is worthwhile, however, to look at the basic 
algorithm described in our earlier works. 5 Let /(x*, t) denote the 
intensity of a scene at the &th sample point x* in the scanning order of 
a frame, and let /(x*, t — r) denote the intensity at the same spatial 
location in the previous frame. If the scene consists of an object that 
is undergoing pure translation under uniform illumination, then, dis- 
regarding the background, 

7(x A , t) = I(x. k -D,t- t), (1) 

where D is the displacement (two-component vector) of the object in 
one frame interval, t. The pel-recursive algorithm obtains an estimate 
of D (i.e., D) by recursively minimizing the square of the displaced- 
frame difference at the current pel location. The displaced-frame 
difference DFD{ • , • ) is defined by 

DFD(x k , D) = J(x*, t) - 7(x* - D, t - t) . (2) 

The minimization is performed by a steepest-descent algorithm of 
the form 

D* +1 = D* - VfeeV D [Z>F.D(x* +1> D*)] 2 , (3) 

where V D [ • ] is the two-dimensional gradient with respect to D. Equa- 
tion (3) can be expanded to 

D* +1 = D* - eDFD(Xk +u D*)V7(x* +] -f> k ,t-r), (4) 

where V = V* is the two-dimensional spatial-gradient operator with 
respect to horizontal and vertical coordinates of vector x. We use a 
finite-difference approximation for the gradient, which is formed by 
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element difference, EDIF k , and line difference, LDIF k , using the pel 
closest to the point x*+i - D* in the previous frame. The above 
displacement estimator requires multiplication at each iteration, which 
is undesirable for hardware implementation and is therefore simplified 
to: 

D A+ i = D* - e sgn| DFD(x k+h Da) | • sgn| W(x* + i - Da, * - t) |, (5) 

where 

-1, if Z<-T 
sgn(z) = 0, if |Z|< T (6) 

+1, otherwise. 

The above recursion to update D* is carried out only in the moving 
areas of the current frame, i.e., for those pels where 

£ \I(Xk+j, t) - I(x k+J , t - t) I > Threshold . 

The motion-compensated coder predicts intensity, I(x k , t), using either 
the previous frame intensity, 7(xa, t - t), or displaced-previous-frame 
intensity, 7(x* - 6*_i, t - t), based on which predictor results in less 
error for certain already transmitted neighbors. We note that if a point 
x — D does not lie on the grid formed by the pels, then interpolation 
is required to evaluate 7(x* - f>A-i, t - t). The displacement at either 
the previous pel or the previous line element is used to form the 
displaced-previous-frame prediction of the present pel. This allows the 
receiver to compute displaced-previous-frame predictions without ex- 
plicit transmission of the displacement. If the magnitude of the predic- 
tion error exceeds a predetermined threshold, the coder transmits a 
quantized version of the prediction error and the necessary addressing 
information to the receiver. 

Above we described the basic motion-compensation algorithm. We 
now describe its modifications relevant to the spatially subsampled 
television signal. Although the algorithm we describe below can be 
applied to signals subsampled in a variety of ways, we restrict ourselves 
to horizontal (along a scan line) subsampling by a factor of two and 
four to one. Modifications and details of each component of the 
motion-compensated system are given below. 

2. 1 Displacement estimator 

Figure 1 shows the pel arrangement used for the updating process of 
the displacement estimator. Since the subsampled and subsequently 
interpolated pels contain more noise, they are given less importance in 
the updating process. This is done in two ways. The frame-difference 
signal at subsampled pels is given less weight in determining where 
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Fig. 1 — Pel configuration for a displacement-updating process. Displacement is re- 
cursively updated only at those pels where the weighted sum of frame difference in a 
window around pel k is above a threshold. 



the displacement is updated and a smaller updating constant ('epsilon') 
is used at subsampled pels. Thus, the estimator works as follows: 

EDIF k 



D* = D*_, - eSGN\DFD(x k , f> k -i)\-SGN 
where 



LDIF k 



€ = 



ei, if pel Xk is subsampled 
62, otherwise . 



(7) 



(8) 



We have found that the estimator performance is improved by choos- 
ing €1 < €2 (both positive numbers). The recursion is carried out only 
at those pels where a weighted sum of the magnitude of the frame 
difference in a window around pel k exceeds a given threshold. Thus, 
D*-i is updated only if 

£ Wj\FDIF(Xk+j)\> THRESHi ; (9) 

j—p 

otherwise D* = D*-i. The weights [wj) are nonnegative and are lower 
for subsampled pels compared with the nonsubsampled pels. The 
threshold, THRESHi, is preselected and optimized by simulations, 
and the displacement D* was used for the prediction of the element at 
location k in the next line (see Fig. 1). 

2.2 Predictor and predictor selection 

As in our earlier works, we have used only predictors based on 
intensities in the previous frame. The previous-frame and displaced- 
previous-frame predictions are calculated for each pel, whether sub- 
sampled or not. Of course, other predictions (e.g., intrafleld predictors 
such as previous element or line) can be used to augment our prediction 
strategy. Having computed both the previous-frame and displaced- 
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frame predictors, we use the following rule to switch adaptively be- 
tween them on a pel-by-pel basis. Referring to Fig. 1, we use the frame- 
difference predictor if 

+m +"< 

£ Wj\FDIF{x^j)\< £ wj\DFD(x k+J ,D k )\; (10) 

j—m j=-m 

otherwise we use the displaced-frame predictor. The above inequality 
is evaluated for a window of size (2m + 1) pels centered around pel k. 
Again, less weight is given to pels that are subsampled, i.e., Wj is lower 
for subsampled pels. In the calculation of FDIF(-) and DFD(-, •), 
sometimes the use of interpolated pels in the previous frame may be 
required. 

2.3 Subsampling and interpolation 

We considered several patterns for subsampling. Some patterns were 
such that they did not change from line to line, field to field, or frame 
to frame, whereas some were intentionally staggered. Some of these 
are described in Section 2.4 Having selected a subsampling pattern, 
we then interpolated the missing pels using an adaptive technique. If 
the magnitude of the prediction error for the nearest nonsubsampled 
pels to the right and left was below a threshold, then the intensity of 
the subsampled pel was replaced by its prediction. If, on the other 
hand, either the closest right or left nonsubsampled neighbor had 
prediction-error magnitude above the threshold, then a simple linear 
(horizontal) interpolation was used to reconstruct the intensity of the 
subsampled pel. The threshold used is the same as the one that 
determines whether the quantization error is transmitted. This type of 
adaptive interpolation improves with the quality of prediction. The 
conditional-replenishment coders subsample the "moving area" pels, 
which are determined by the frame difference signal. This has the 
effect of blurring the entire moving area even if more sophisticated 
predictors are used. Our strategy, on the other hand, replaces all the 
"predictable" pels (as determined by the closest neighbors) by their 
prediction rather than by spatial interpolation. Thus, only the unpre- 
dictable areas are blurred by spatial interpolation. 

2.4 Transmitted information 

As previously mentioned, we transmitted to the receiver the quan- 
tized prediction error of every nonsubsampled pel where magnitude of 
prediction error was above a threshold, called the replenishment 
threshold. This classifies pels into predictable and unpredictable pels. 
A 35-level symmetric quantizer was used with representative levels at 
0, 3, 6, 11, 16, 21, 28, 35, 44, 53, 64, 77, 92, 109, 128, 149, 178, and 197 
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(on an 8-bit scale of to 255). We obtained decision levels by averaging 
the adjacent representative levels. A code set for representing the 
quantizer levels was not designed, but entropies were computed. In 
addition, horizontal run lengths of predictable and unpredictable pels 
were transmitted. Here again, entropies of the predictable- and unpre- 
dictable-pel run lengths were calculated. This assumes that in practice 
separate code sets will be used for run lengths of predictable and 
unpredictable pels. 

III. SIMULATIONS AND RESULTS 

Computer simulations were performed on two types of scenes. The 
first was a synthetic scene that was computer-generated. It was a 
damped radial cosine in intensity with a radius of 60 pels, which 
translated from frame to frame by a given amount. The pattern is 
described mathematically by the intensity function 

I(R) = 100-exp(-0.0lR)cos(27rR/P), < 60 , 

where R is the radial distance from the center [taken to be (100,100)] 
and 

P= (1-^/60)10 + 10. 

This function is displayed on a 256- by 256-element raster in two 
interlaced fields of 128 lines each. The pattern is shown as Fig. 6 in 
Ref. 7. The other scene, called Judy, is a head and shoulders view of 
a person engaged in active conversation. This consisted of 50 frames 
obtained by taking a Nyquist-rate sampling of a video signal having a 
1-MHz bandwidth. Each sample was quantized uniformly to 8 bits. 
Four frames of this sequence are shown in Fig. 4 of Ref. 5. 

3. 1 Synthetic Scene 

The simulations on the synthetic scene were restricted to 2:1 sub- 
sampling with a subsampling pattern that did not change from line to 
line and field to field (referred to as a nonstaggered pattern). The 
purpose of this simulation was to evaluate the degradation in the 
performance of the displacement estimator. Therefore, only the dis- 
placement was calculated, without using it for coding. Figures 2 and 3 
show the relative displacement error as a function of the iteration 
number. The iteration number in this case refers to only those in- 
stances where the displacement was actually updated. Owing to several 
factors [e.g., the setting of the threshold in eq. (9)], a given iteration 
number in a subsampled case may not be at the same location in the 
nonsampled case. Figure 2 shows the case when the pattern moves at 

4 pels per frame, whereas Fig. 3 shows the results for displacement of 

5 pels per frame. The parameters of the displacement estimator (of 
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Fig. 2— Relative error in displacement for a synthetic moving pattern at 4 pels per 
frame as a function of iteration number. 
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Fig. 3— Relative error in displacement for a synthetic moving pattern at 5 pels per 
frame as a function of iteration number. 



1902 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1 982 



65 



CONDITIONAL 

REPLENISHMENT 

(NOSUBSAMPLING) 




10 



20 



30 



40 50 60 

FIELD NUMBER 



80 



90 



100 



Fig. 4 — Bits required per field for conditional-replenishment scheme using a fine (511- 
level) quantizer. 

Section II) were selected by trial and error. It is seen from both figures 
that the convergence of the displacement estimator is more rapid when 
there is no subsampling. The degradation appears to be somewhat less 
for the 5-pel/frame case as compared with the 4-pel/frame case. The 
effect of this degradation on the motion-compensated coder is evalu- 
ated in Section 3.2. 



TELEVISION CODERS 1903 



MOTION COMPENSATION 
(NOSUBSAMPLING) 

/ 




\ MOTION COMPENSATION 
(4:1SUBSAMPLING) 



10 



20 



30 



40 50 60 

FIELD NUMBER 



70 



80 



90 100 



Fig. 5 — Bits required per field for motion-compensation scheme using a fine (511- 
level) quantizer. 

3.2 Real Scene 

Before the performance of the subsampled motion-compensated 
coders is given, it is important to note the parameters that were chosen 
for simulations. These choices were based on trial and error. While 
this may not have resulted in optimum choices, our choices may not 
be too far from the optimum. 

The epsilon of eq. (8) was different for subsampled pels compared 
with nonsubsampled pels; ei was Vm, and 62 was V62. The parameters in 
the update condition of eq. (9) were taken to be: p - 1, and THRESHi 
= 4 (on a scale of to 255, 8-bits).The weight given to subsampled pels 
was 1 and to nonsubsampled pels was 2. Thus, through e and these 
weights, subsampled pels were given less "importance" in the displace- 
ment estimation process. The predictor selection [eq. (10)] was done 
by using a window of 3 (i.e., m = 1), and the subsampled pels were 
given weight wj = 1, whereas nonsubsampled pels were given weight 
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Fig. 6 — Bits required per field for conditional -replenishment scheme using a coarse 
35-level quantizer. 



Wj = 4. The displacement estimator was not initialized at the beginning 
of each scanning line; thus, the estimate from the last pel of the 
previous line was used as the initial estimate for the first pel of the 
next line. 

The performance was measured by the number of coded bits that 
were required for each field (approximated by appropriate entropies). 
When the coarse quantizer of Section 2.4 was used, then the total 
squared-interpolation error was also calculated for each field. Figures 
4 and 5 show the coded bits for both conditional replenishment and 
motion compensation, when no coarse quantization was performed 
(i.e., quantizer with 511 levels was used). It is obvious that 2:1 subsam- 
pling reduces the bits/field by approximately two for both conditional 
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replenishment and motion compensation. The decrease in bit rates is 
high for those fields with a large amount of motion. The 4:1 subsam- 
pling reduces the bit rates of motion-compensation schemes much 
more significantly compared with conditional replenishment. However, 
this may be a peculiarity of the particular scene we used for simulation. 
We used a few more scenes and found that 4:1 subsampling, in general, 
reduced the rate by a factor of two compared with 2:1 subsampling for 
both conditional replenishment and motion compensation. 

Using the 35-level quantizer mentioned earlier, the bit rates are 
plotted in Figs. 6 and 7. Figure 6 shows conditional replenishment and 
Fig. 7 shows motion compensation. These figures indicate that motion 
compensation results in approximately 60-percent reduction for no 
subsampling, approximately 80-percent reduction for 2:1 subsampling, 
and about 90-percent reduction for 4:1 subsampling, compared with 
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Fig. 7— Bits required per field for motion-compensation scheme using a coarse 35- 
level quantizer. 
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Fig. 8 — Plot of squared-interpolation error summed over entire field versus field 
number of 2:1 spatial subsampling. Adaptive interpolation is used in the case of motion 
compensation. Fixed-linear one-dimensional spatial interpolation is used for conditional 
replenishment. 



conditional replenishment. Obviously these conclusions are scene-de- 
pendent. However, for scenes containing significant translational mo- 
tion, such conclusions may remain valid. It is always difficult to 
evaluate quality of short segments of scenes. We made informal 
observations to compare pictures resulting from conditional replenish- 
ment and motion compensation. Subsampling results in visible blur- 
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ring. However, it was found that, owing to our adaptive interpolation 
scheme, motion compensation blurred a much smaller area than did 
conditional replenishment. Also, the blurred areas in motion compen- 
sation appear to be much more fragmented and somewhat randomly 
distributed, which also decreases their visibility. Figures 8 and 9 show 
the plots of squared-interpolation error per field versus the field 
number for both 2:1 and 4:1 subsampling. Curves for both frame- 
difference conditional replenishment and motion compensation are 
shown. In the case of 2:1 subsampling, the interpolation error decreases 
by almost a factor of two using motion compensation. This decrease is 
even greater for 4:1 subsampling. 

In our simulations we also tried staggering the subsampling pattern 
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Fig. 9— Plot of squared interpolation error summed over an entire field versus field 
number for a 4:1 spatial subsampling. 
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from line to line, field to field, and frame to frame. This makes the 
quantization and the interpolation noise appear random and less 
patterned. However, field-to-field or frame-to-frame staggering results 
in annoying flicker, as expected. We thus found line-to-line staggering 
to be most useful. Although staggering improved the quality of pictures 
for both conditional replenishment and motion compensation, the 
improvement was somewhat higher for conditional replenishment. The 
required bits per field did not change significantly because of stagger- 
ing. Thus, we might conclude that line-to-line staggered-subsampling 
patterns improve the quality of pictures without any significant in- 
crease in the bit rates. 

IV. SUMMARY AND CONCLUSIONS 

We have presented in this paper schemes for motion compensation 
in the presence of spatial subsampling, which is required in interframe 
coders to prevent buffer overflow. We also described an adaptive inter- 
polation scheme that blurred only the "unpredictable" area during 
subsampling. Computer simulations were performed in synthetic 
scenes to evaluate degradation of the displacement estimator in the 
presence of subsampling. It was found that, although the quality of 
displacement estimation was degraded in the presence of spatial sub- 
sampling, the effect on the bit rates was not significant. Compared 
with conditional replenishment, motion compensation reduced the bit 
rates by a factor of two or more even during subsampling. Thus, a 2:1 
subsampled motion-compensated coder results in about one quarter of 
the bit rate of conditional replenishment and about one half of the bit 
rate of the 2:1 subsampled conditional-replenishment coder. Our adap- 
tive interpolation scheme blurs only the "unpredictable area" rather 
than the "moving area" that is blurred in subsampled conditional- 
replenishment coders. Since "unpredictable area" is a subset of moving 
area and is fragmented randomly, the blurring caused by subsampling 
in the case of a motion-compensated coder is much less visible. This 
is also borne out by the total-interpolation error, which decreases by 
more than a factor of two using adaptive interpolation. 
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